Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 180
Filtrar
1.
Nucleic Acids Res ; 52(D1): D1062-D1071, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-38000392

RESUMO

The SysteMHC Atlas v1.0 was the first public repository dedicated to mass spectrometry-based immunopeptidomics. Here we introduce a newly released version of the SysteMHC Atlas v2.0 (https://systemhc.sjtu.edu.cn), a comprehensive collection of 7190 MS files from 303 allotypes. We extended and optimized a computational pipeline that allows the identification of MHC-bound peptides carrying on unexpected post-translational modifications (PTMs), thereby resulting in 471K modified peptides identified over 60 distinct PTM types. In total, we identified approximately 1.0 million and 1.1 million unique peptides for MHC class I and class II immunopeptidomes, respectively, indicating a 6.8-fold increase and a 28-fold increase to those in v1.0. The SysteMHC Atlas v2.0 introduces several new features, including the inclusion of non-UniProt peptides, and the incorporation of several novel computational tools for FDR estimation, binding affinity prediction and motif deconvolution. Additionally, we enhanced the user interface, upgraded website framework, and provided external links to other resources related. Finally, we built and provided various spectral libraries as community resources for data mining and future immunopeptidomic and proteomic analysis. We believe that the SysteMHC Atlas v2.0 is a unique resource to provide key insights to the immunology and proteomics community and will accelerate the development of vaccines and immunotherapies.


Assuntos
Bases de Dados de Proteínas , Peptídeos , Proteômica , Espectrometria de Massas , Peptídeos/química , Peptídeos/imunologia , Processamento de Proteína Pós-Traducional , Proteômica/métodos , Bases de Dados de Proteínas/normas , Internet , Humanos , Animais
2.
Acta Crystallogr F Struct Biol Commun ; 77(Pt 7): 226-229, 2021 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-34196613

RESUMO

In macromolecular crystallography, paired refinement is generally accepted to be the optimal approach for the determination of the high-resolution cutoff. The software tool PAIREF provides automation of the protocol and associated analysis. Support for phenix.refine as a refinement engine has recently been implemented in the program. This feature is presented here using previously published data for thermolysin. The results demonstrate the importance of the complete cross-validation procedure to obtain a thorough and unbiased insight into the quality of high-resolution data.


Assuntos
Cristalografia por Raios X/métodos , Bases de Dados de Proteínas , Software , Cristalografia por Raios X/normas , Bases de Dados de Proteínas/normas , Software/normas
3.
Structure ; 29(4): 393-400.e1, 2021 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-33657417

RESUMO

The Worldwide Protein Data Bank (wwPDB) has provided validation reports based on recommendations from community Validation Task Forces for structures in the PDB since 2013. To further enhance validation of small molecules as recommended from the 2016 Ligand Validation Workshop, wwPDB, Global Phasing Ltd., and the Noguchi Institute, recently formed a public/private partnership to incorporate some of their software tools into the wwPDB validation package. Augmented wwPDB validation report features include: two-dimensional (2D) diagrams of small-molecule ligands and carbohydrates, highlighting geometric validation outcomes; 2D topological diagrams of oligosaccharides present in branched entities generated using 2D Symbol Nomenclature for Glycan representation; and views of 3D electron density maps for ligands and carbohydrates, illustrating the goodness-of-fit between the atomic structure and experimental data (X-ray crystallographic structures only). These improvements will impact confidence in ligand conformation and ligand-macromolecular interactions that will aid in understanding biochemical function and contribute to small-molecule drug discovery.


Assuntos
Carboidratos/química , Bases de Dados de Proteínas/normas , Simulação de Acoplamento Molecular/métodos , Proteômica/métodos , Bibliotecas de Moléculas Pequenas/química , Quimioinformática/métodos , Bases de Dados de Compostos Químicos/normas , Humanos , Ligantes , Ligação Proteica , Proteoma/química , Proteoma/metabolismo
4.
Proteins ; 89(2): 242-250, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-32935893

RESUMO

A major challenge for protein databases is reconciling information from diverse sources. This is especially difficult when some information consists of secondary, human-interpreted rather than primary data. For example, the Swiss-Prot database contains curated annotations of subcellular location that are based on predictions from protein sequence, statements in scientific articles, and published experimental evidence. The Human Protein Atlas (HPA) consists of millions of high-resolution microscopic images that show protein spatial distribution on a cellular and subcellular level. These images are manually annotated with protein subcellular locations by trained experts. The image annotations in HPA can capture the variation of subcellular location across different cell lines, tissues, or tissue states. Systematic investigation of the consistency between HPA and Swiss-Prot assignments of subcellular location, which is important for understanding and utilizing protein location data from the two databases, has not been described previously. In this paper, we quantitatively evaluate the consistency of subcellular location annotations between HPA and Swiss-Prot at multiple levels, as well as variation of protein locations across cell lines and tissues. Our results show that annotations of these two databases differ significantly in many cases, leading to proposed procedures for deriving and integrating the protein subcellular location data. We also find that proteins having highly variable locations are more likely to be biomarkers of diseases, providing support for incorporating analysis of subcellular location in protein biomarker identification and screening.


Assuntos
Bases de Dados de Proteínas/normas , Anotação de Sequência Molecular/normas , Proteínas/metabolismo , Atlas como Assunto , Compartimento Celular , Linhagem Celular , Células Eucarióticas/metabolismo , Células Eucarióticas/ultraestrutura , Humanos , Variações Dependentes do Observador , Proteínas/química , Proteínas/genética , Reprodutibilidade dos Testes , Incerteza
5.
J Immunother Cancer ; 8(2)2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-33109630

RESUMO

BACKGROUND: Checkpoint targets play a key role in tumor-mediated immune escape and therefore are critical for cancer immunotherapy. Unfortunately, there is a lack of bioinformatics resource that compile all the checkpoint targets for translational research and drug discovery in immuno-oncology. METHODS: To this end, we developed checkpoint therapeutic target database (CKTTD), the first comprehensive database for immune checkpoint targets (proteins, miRNAs and LncRNAs) and their modulators. A scoring system was adopted to filter more relevant targets with high confidence. In addition, a few biological databases such as Oncomine, Drugbank, miRBase and Lnc2Cancer database were integrated into CKTTD to provide an in-depth information. Moreover, we computed and provided ligand-binding site information for all the targets which may support bench scientists for drug discovery efforts. RESULTS: In total, CKTTD compiles 105 checkpoint protein targets, 53 modulators (small-molecules and antibody), 30 miRNAs and 18 LncRNAs in cancer immunotherapy with validated experimental evidences curated from 10 649 literatures via an enhanced text-mining system. CONCLUSIONS: In conclusion, the CKTTD may serve as a useful platform for the research of cancer immunotherapy and drug discovery. The CKTTD database is freely available to public at http://www.ckttdb.org/.


Assuntos
Bases de Dados de Proteínas/normas , Imunoterapia/métodos , Humanos
6.
BMC Bioinformatics ; 21(Suppl 13): 384, 2020 Sep 17.
Artigo em Inglês | MEDLINE | ID: mdl-32938375

RESUMO

BACKGROUND: Protein-DNA interaction governs a large number of cellular processes, and it can be altered by a small fraction of interface residues, i.e., the so-called hot spots, which account for most of the interface binding free energy. Accurate prediction of hot spots is critical to understand the principle of protein-DNA interactions. There are already some computational methods that can accurately and efficiently predict a large number of hot residues. However, the insufficiency of experimentally validated hot-spot residues in protein-DNA complexes and the low diversity of the employed features limit the performance of existing methods. RESULTS: Here, we report a new computational method for effectively predicting hot spots in protein-DNA binding interfaces. This method, called PreHots (the abbreviation of Predicting Hotspots), adopts an ensemble stacking classifier that integrates different machine learning classifiers to generate a robust model with 19 features selected by a sequential backward feature selection algorithm. To this end, we constructed two new and reliable datasets (one benchmark for model training and one independent dataset for validation), which totally consist of 123 hot spots and 137 non-hot spots from 89 protein-DNA complexes. The data were manually collected from the literature and existing databases with a strict process of redundancy removal. Our method achieves a sensitivity of 0.813 and an AUC score of 0.868 in 10-fold cross-validation on the benchmark dataset, and a sensitivity of 0.818 and an AUC score of 0.820 on the independent test dataset. The results show that our approach outperforms the existing ones. CONCLUSIONS: PreHots, which is based on stack ensemble of boosting algorithms, can reliably predict hot spots at the protein-DNA binding interface on a large scale. Compared with the existing methods, PreHots can achieve better prediction performance. Both the webserver of PreHots and the datasets are freely available at: http://dmb.tongji.edu.cn/tools/PreHots/ .


Assuntos
Algoritmos , Proteínas de Ligação a DNA/genética , Bases de Dados de Proteínas/normas , Humanos , Modelos Moleculares
7.
FEBS J ; 287(17): 3703-3718, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32418327

RESUMO

A bright spot in the SARS-CoV-2 (CoV-2) coronavirus pandemic has been the immediate mobilization of the biomedical community, working to develop treatments and vaccines for COVID-19. Rational drug design against emerging threats depends on well-established methodology, mainly utilizing X-ray crystallography, to provide accurate structure models of the macromolecular drug targets and of their complexes with candidates for drug development. In the current crisis, the structural biological community has responded by presenting structure models of CoV-2 proteins and depositing them in the Protein Data Bank (PDB), usually without time embargo and before publication. Since the structures from the first-line research are produced in an accelerated mode, there is an elevated chance of mistakes and errors, with the ultimate risk of hindering, rather than speeding up, drug development. In the present work, we have used model-validation metrics and examined the electron density maps for the deposited models of CoV-2 proteins and a sample of related proteins available in the PDB as of April 1, 2020. We present these results with the aim of helping the biomedical community establish a better-validated pool of data. The proteins are divided into groups according to their structure and function. In most cases, no major corrections were necessary. However, in several cases significant revisions in the functionally sensitive area of protein-inhibitor complexes or for bound ions justified correction, re-refinement, and eventually reversioning in the PDB. The re-refined coordinate files and a tool for facilitating model comparisons are available at https://covid-19.bioreproducibility.org. DATABASE: Validated models of CoV-2 proteins are available in a dedicated, publicly accessible web service https://covid-19.bioreproducibility.org.


Assuntos
Enzima de Conversão de Angiotensina 2/química , Antivirais/química , Proteases 3C de Coronavírus/química , Receptores Virais/química , SARS-CoV-2/química , Glicoproteína da Espícula de Coronavírus/química , Enzima de Conversão de Angiotensina 2/antagonistas & inibidores , Enzima de Conversão de Angiotensina 2/genética , Enzima de Conversão de Angiotensina 2/metabolismo , Antivirais/farmacologia , Sítios de Ligação , COVID-19/virologia , Proteases 3C de Coronavírus/antagonistas & inibidores , Proteases 3C de Coronavírus/genética , Proteases 3C de Coronavírus/metabolismo , Microscopia Crioeletrônica , Cristalografia por Raios X , Bases de Dados de Proteínas/normas , Desenho de Fármacos , Humanos , Ligantes , Modelos Moleculares , Inibidores de Proteases/química , Inibidores de Proteases/farmacologia , Ligação Proteica , Conformação Proteica em alfa-Hélice , Conformação Proteica em Folha beta , Domínios e Motivos de Interação entre Proteínas , Receptores Virais/antagonistas & inibidores , Receptores Virais/genética , Receptores Virais/metabolismo , Glicoproteína da Espícula de Coronavírus/antagonistas & inibidores , Glicoproteína da Espícula de Coronavírus/genética , Glicoproteína da Espícula de Coronavírus/metabolismo , Termodinâmica
8.
FEBS J ; 287(13): 2685-2698, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32311227

RESUMO

Crystallographic models of biological macromolecules have been ranked using the quality criteria associated with them in the Protein Data Bank (PDB). The outcomes of this quality analysis have been correlated with time and with the journals that published papers based on those models. The results show that the overall quality of PDB structures has substantially improved over the last ten years, but this period of progress was preceded by several years of stagnation or even depression. Moreover, the study shows that the historically observed negative correlation between journal impact and the quality of structural models presented therein seems to disappear as time progresses.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas/normas , Substâncias Macromoleculares/química , Modelos Moleculares , Proteínas/química , Controle de Qualidade , Algoritmos , Conformação Proteica , Domínios Proteicos
9.
Proteomics ; 20(10): e1900261, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32249536

RESUMO

Proteogenomics is gaining momentum as, today, genomics, transcriptomics, and proteomics can be readily performed on any new species. This approach allows key alterations to molecular pathways to be identified when comparing conditions. For animals and plants, RNA-seq-informed proteomics is the most popular means of interpreting tandem mass spectrometry spectra acquired for species for which the genome has not yet been sequenced. It relies on high-performance de novo RNA-seq assembly and optimized translation strategies. Here, several pre-treatments for Illumina RNA-seq reads before assembly are explored to translate the resulting contigs into useful polypeptide sequences. Experimental transcriptomics and proteomics datasets acquired for individual Gammarus fossarum freshwater crustaceans are used, the most relevant procedure is defined by the ratio of MS/MS spectra assigned to peptide sequences. Removing reads with a mean quality score of less than 17-which represents a single probable nucleotide error on 150-bp reads-prior to assembly, increases the proteomics outcome. The best translation using Transdecoder is achieved with a minimal open reading frame length of 50 amino acids and systematic selection of ORFs longer than 900 nucleotides. Using these parameters, transcriptome assembly and translation informed by proteomics pave the way to further improvements in proteogenomics.


Assuntos
Proteogenômica/métodos , Proteômica , RNA-Seq , Transcriptoma/genética , Sequência de Aminoácidos/genética , Animais , Biologia Computacional , Bases de Dados de Proteínas/normas , Genoma/genética , Genômica/tendências , Análise de Sequência de RNA
10.
FEBS J ; 287(13): 2664-2684, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-31944606

RESUMO

Phosphatases play an essential role in the regulation of protein phosphorylation. Less abundant than kinases, many phosphatases are components of one or more macromolecular complexes with different substrate specificities and specific functionalities. The expert scientific curation of phosphatase complexes for the UniProt and Complex Portal databases supports the whole scientific community by collating and organising small- and large-scale experimental data from the scientific literature into context-specific central resources, where the data can be freely accessed and used to further academic and translational research. In this review, we discuss how the diverse biological functions of phosphatase complexes are presented in UniProt and the Complex Portal, and how understanding the biological significance of phosphatase complexes in Caenorhabditis elegans offers insight into the mechanisms of substrate diversity in a variety of cellular and molecular processes.


Assuntos
Proteínas de Caenorhabditis elegans/metabolismo , Caenorhabditis elegans/metabolismo , Bases de Dados de Proteínas/normas , Complexos Multiproteicos/metabolismo , Monoéster Fosfórico Hidrolases/metabolismo , Processamento de Proteína Pós-Traducional , Animais , Proteínas de Caenorhabditis elegans/química , Complexos Multiproteicos/química , Monoéster Fosfórico Hidrolases/química , Fosforilação , Especificidade por Substrato
11.
BMC Bioinformatics ; 20(1): 228, 2019 May 06.
Artigo em Inglês | MEDLINE | ID: mdl-31060495

RESUMO

BACKGROUND: An orthologous group (OG) comprises a set of orthologous and paralogous genes that share a last common ancestor (LCA). OGs are defined with respect to a chosen taxonomic level, which delimits the position of the LCA in time to a specified speciation event. A hierarchy of OGs expands on this notion, connecting more general OGs, distant in time, to more recent, fine-grained OGs, thereby spanning multiple levels of the tree of life. Large scale inference of OG hierarchies with independently computed taxonomic levels can suffer from inconsistencies between successive levels, such as the position in time of a duplication event. This can be due to confounding genetic signal or algorithmic limitations. Importantly, inconsistencies limit the potential use of OGs for functional annotation and third-party applications. RESULTS: Here we present a new methodology to ensure hierarchical consistency of OGs across taxonomic levels. To resolve an inconsistency, we subsample the protein space of the OG members and perform gene tree-species tree reconciliation for each sampling. Differently from previous approaches, by subsampling the protein space, we avoid the notoriously difficult task of accurately building and reconciling very large phylogenies. We implement the method into a high-throughput pipeline and apply it to the eggNOG database. We use independent protein domain definitions to validate its performance. CONCLUSION: The presented consistency pipeline shows that, contrary to previous limitations, tree reconciliation can be a useful instrument for the construction of OG hierarchies. The key lies in the combination of sampling smaller trees and aggregating their reconciliations for robustness. Results show comparable or greater performance to previous pipelines. The code is available on Github at: https://github.com/meringlab/og_consistency_pipeline .


Assuntos
Bases de Dados de Proteínas/normas , Filogenia
12.
J Proteome Res ; 18(3): 1019-1031, 2019 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-30652484

RESUMO

In the current study, we show how ProCan90, a curated data set of HEK293 technical replicates, can be used to optimize the configuration options for algorithms in the OpenSWATH pipeline. Furthermore, we use this case study as a proof of concept for horizontal scaling of such a pipeline to allow 45 810 computational analysis runs of OpenSWATH to be completed within four and a half days on a budget of US $10 000. Through the use of Amazon Web Services (AWS), we have successfully processed each of the ProCan 90 files with 506 combinations of input parameters. In total, the project consumed more than 340 000 core hours of compute and generated in excess of 26 TB of data. Using the resulting data and a set of quantitative metrics, we show an analysis pathway that allows the calculation of two optimal parameter sets, one for a compute rich environment (where run time is not a constraint), and another for a compute poor environment (where run time is optimized). For the same input files and the compute rich parameter set, we show a 29.8% improvement in the number of quality protein (>2 peptide) identifications found compared to the current OpenSWATH defaults, with negligible adverse effects on quantification reproducibility or drop in identification confidence, and a median run time of 75 min (103% increase). For the compute poor parameter set, we find a 55% improvement in the run time from the default parameter set, at the expense of a 3.4% decrease in the number of quality protein identifications, and an intensity CV decrease from 14.0% to 13.7%.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas/normas , Conjuntos de Dados como Assunto/normas , Células HEK293 , Humanos , Proteínas/análise , Proteômica/métodos , Reprodutibilidade dos Testes , Fatores de Tempo
13.
J Proteome Res ; 18(2): 585-593, 2019 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-30560673

RESUMO

Decoy database search with target-decoy competition (TDC) provides an intuitive, easy-to-implement method for estimating the false discovery rate (FDR) associated with spectrum identifications from shotgun proteomics data. However, the procedure can yield different results for a fixed data set analyzed with different decoy databases, and this decoy-induced variability is particularly problematic for smaller FDR thresholds, data sets, or databases. The average TDC (aTDC) protocol combats this problem by exploiting multiple independently shuffled decoy databases to provide an FDR estimate with reduced variability. We provide a tutorial introduction to aTDC, describe an improved variant of the protocol that offers increased statistical power, and discuss how to deploy aTDC in practice using the Crux software toolkit.


Assuntos
Bases de Dados de Proteínas/normas , Proteômica/métodos , Software , Conjuntos de Dados como Assunto , Humanos , Modelos Estatísticos , Reprodutibilidade dos Testes
14.
Autophagy ; 14(12): 2033-2034, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30296899

RESUMO

I routinely see people use incorrect names for MAP1LC3/LC3 isoforms in scientific papers. In fact, it happens often enough that I decided to investigate the reason for the apparent confusion. It turns out that the sources of misinformation are abundant, including UniProt and antibody supplier web sites.


Assuntos
Anticorpos/classificação , Proteínas Associadas aos Microtúbulos/classificação , Terminologia como Assunto , Proteínas Relacionadas à Autofagia/química , Proteínas Relacionadas à Autofagia/imunologia , Comércio/normas , Bases de Dados de Proteínas/classificação , Bases de Dados de Proteínas/normas , Humanos , Proteínas Associadas aos Microtúbulos/química , Proteínas Associadas aos Microtúbulos/imunologia , Isoformas de Proteínas/classificação , Isoformas de Proteínas/imunologia
15.
J Proteome Res ; 17(12): 4051-4060, 2018 12 07.
Artigo em Inglês | MEDLINE | ID: mdl-30270626

RESUMO

The 2017 Dagstuhl Seminar on Computational Proteomics provided an opportunity for a broad discussion on the current state and future directions of the generation and use of peptide tandem mass spectrometry spectral libraries. Their use in proteomics is growing slowly, but there are multiple challenges in the field that must be addressed to further increase the adoption of spectral libraries and related techniques. The primary bottlenecks are the paucity of high quality and comprehensive libraries and the general difficulty of adopting spectral library searching into existing workflows. There are several existing spectral library formats, but none captures a satisfactory level of metadata; therefore, a logical next improvement is to design a more advanced, Proteomics Standards Initiative-approved spectral library format that can encode all of the desired metadata. The group discussed a series of metadata requirements organized into three designations of completeness or quality, tentatively dubbed bronze, silver, and gold. The metadata can be organized at four different levels of granularity: at the collection (library) level, at the individual entry (peptide ion) level, at the peak (fragment ion) level, and at the peak annotation level. Strategies for encoding mass modifications in a consistent manner and the requirement for encoding high-quality and commonly seen but as-yet-unidentified spectra were discussed. The group also discussed related topics, including strategies for comparing two spectra, techniques for generating representative spectra for a library, approaches for selection of optimal signature ions for targeted workflows, and issues surrounding the merging of two or more libraries into one. We present here a review of this field and the challenges that the community must address in order to accelerate the adoption of spectral libraries in routine analysis of proteomics datasets.


Assuntos
Bases de Dados de Proteínas/normas , Biblioteca de Peptídeos , Proteômica/métodos , Animais , Humanos , Espectrometria de Massas em Tandem/métodos , Fluxo de Trabalho
16.
Acta Crystallogr D Struct Biol ; 74(Pt 9): 939-945, 2018 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-30198902

RESUMO

The Protein Data Bank (PDB) constitutes a collection of the available atomic models of macromolecules and their complexes obtained by various methods used in structural biology, but chiefly by crystallography. It is an indispensable resource for all branches of science that deal with the structures of biologically active molecules, such as structural biology, bioinformatics, the design of novel drugs etc. Since not all users of the PDB are familiar with the methods of crystallography, it is important to present the results of crystallographic analyses in a form that is easy to interpret by nonspecialists. It is advisable during the submission of structures to the PDB to pay attention to the optimal placement of molecules within the crystal unit cell, to the correct representation of oligomeric assemblies and to the proper selection of the space-group symmetry. Examples of significant departures from these principles illustrate the potential for the misinterpretation of such suboptimally presented crystal structures.


Assuntos
Bases de Dados de Proteínas/normas , Conformação Proteica , Proteínas/química , Cristalografia por Raios X , Humanos , Modelos Moleculares
17.
Acta Crystallogr F Struct Biol Commun ; 74(Pt 8): 463-472, 2018 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-30084395

RESUMO

Glycosylation is one of the most common forms of protein post-translational modification, but is also the most complex. Dealing with glycoproteins in structure model building, refinement, validation and PDB deposition is more error-prone than dealing with nonglycosylated proteins owing to limitations of the experimental data and available software tools. Also, experimentalists are typically less experienced in dealing with carbohydrate residues than with amino-acid residues. The results of the reannotation and re-refinement by PDB-REDO of 8114 glycoprotein structure models from the Protein Data Bank are analyzed. The positive aspects of 3620 reannotations and subsequent refinement, as well as the remaining challenges to obtaining consistently high-quality carbohydrate models, are discussed.


Assuntos
Bases de Dados de Proteínas/classificação , Bases de Dados de Proteínas/normas , Glicoproteínas/química , Glicoproteínas/classificação
18.
BMC Bioinformatics ; 19(1): 204, 2018 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-29859055

RESUMO

BACKGROUND: Identifying protein functional sites (PFSs) and, particularly, the physicochemical interactions at these sites is critical to understanding protein functions and the biochemical reactions involved. Several knowledge-based methods have been developed for the prediction of PFSs; however, accurate methods for predicting the physicochemical interactions associated with PFSs are still lacking. RESULTS: In this paper, we present a sequence-based method for the prediction of physicochemical interactions at PFSs. The method is based on a functional site and physicochemical interaction-annotated domain profile database, called fiDPD, which was built using protein domains found in the Protein Data Bank. This method was applied to 13 target proteins from the very recent Critical Assessment of Structure Prediction (CASP10/11), and our calculations gave a Matthews correlation coefficient (MCC) value of 0.66 for PFS prediction and an 80% recall in the prediction of the associated physicochemical interactions. CONCLUSIONS: Our results show that, in addition to the PFSs, the physical interactions at these sites are also conserved in the evolution of proteins. This work provides a valuable sequence-based tool for rational drug design and side-effect assessment. The method is freely available and can be accessed at http://202.119.249.49 .


Assuntos
Bases de Dados de Proteínas/normas , Proteínas/química , Análise de Sequência de Proteína/métodos , Humanos
19.
Acta Crystallogr D Struct Biol ; 74(Pt 6): 531-544, 2018 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-29872004

RESUMO

This article describes the implementation of real-space refinement in the phenix.real_space_refine program from the PHENIX suite. The use of a simplified refinement target function enables very fast calculation, which in turn makes it possible to identify optimal data-restraint weights as part of routine refinements with little runtime cost. Refinement of atomic models against low-resolution data benefits from the inclusion of as much additional information as is available. In addition to standard restraints on covalent geometry, phenix.real_space_refine makes use of extra information such as secondary-structure and rotamer-specific restraints, as well as restraints or constraints on internal molecular symmetry. The re-refinement of 385 cryo-EM-derived models available in the Protein Data Bank at resolutions of 6 Šor better shows significant improvement of the models and of the fit of these models to the target maps.


Assuntos
Microscopia Crioeletrônica/métodos , Software , Animais , Simulação por Computador , Cristalografia/métodos , Bases de Dados de Proteínas/normas , Humanos , Substâncias Macromoleculares/química , Modelos Moleculares , Canais de Cátion TRPV/química , Estudos de Validação como Assunto
20.
Endocrinology ; 159(6): 2397-2407, 2018 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-29718163

RESUMO

Nuclear receptors (NRs) are ligand-inducible transcription factors that play critical roles in metazoan development, reproduction, and physiology and therefore are implicated in a broad range of pathologies. The transcriptional activity of NRs critically depends on their interaction(s) with transcriptional coregulator proteins, including coactivators and corepressors. Short leucine-rich peptide motifs in these proteins (LxxLL in coactivators and LxxxIxxxL in corepressors) are essential and sufficient for NR binding. With 350 different coregulator proteins identified to date and with many coregulators containing multiple interaction motifs, an enormous combinatorial potential is present for selective NR-mediated gene regulation. However, NR-coregulator interactions have often been determined experimentally on a one-to-one basis across diverse experimental conditions. In addition, NR-coregulator interactions are difficult to predict because the molecular determinants that govern specificity are not well established. Therefore, many biologically and clinically relevant NR-coregulator interactions may remain to be discovered. Here, we present a comprehensive overview of 3696 NR-coregulator interactions by systematically characterizing the binding of 24 nuclear receptors with 154 coregulator peptides. We identified unique ligand-dependent NR-coregulator interaction profiles for each NR, confirming many well-established NR-coregulator interactions. Hierarchical clustering based on the NR-coregulator interaction profiles largely recapitulates the classification of NR subfamilies based on the primary amino acid sequences of the ligand-binding domains, indicating that amino acid sequence is an important, although not the only, molecular determinant in directing and fine-tuning NR-coregulator interactions. This NR-coregulator peptide interactome provides an open data resource for future biological and clinical discovery as well as NR-based drug design.


Assuntos
Proteínas Correpressoras/genética , Bases de Dados de Proteínas , Mapeamento de Interação de Proteínas/métodos , Receptores Citoplasmáticos e Nucleares/metabolismo , Fatores de Transcrição/genética , Animais , Análise por Conglomerados , Proteínas Correpressoras/metabolismo , Bases de Dados de Proteínas/normas , Bases de Dados de Proteínas/provisão & distribuição , Desenho de Fármacos , Perfilação da Expressão Gênica , Ensaios de Triagem em Larga Escala , Humanos , Filogenia , Ligação Proteica , Domínios Proteicos , Receptores Citoplasmáticos e Nucleares/genética , Fatores de Transcrição/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA